Local structure of directed networks 
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Previous work on undirected small-world networks established the paradigm that locally struc- 
tured networks tend to have high density of short loops. On the other hand, many realistic networks 
are directed. Here we investigate the local organization of directed networks and find, surprisingly, 
that real networks often have very few short loops as compared to random models. We develop 
a theory and derive conditions for determining if a given network has more or less loops than its 
randomized counterparts. These findings carry broad implications for structural and dynamical 
processes sustained by directed networks. 
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Asymmetric interactions are widespread in natural and 
technological networks, particularly when the network 
transports a flow or underlies collective behavior [l[ . The 
structure of such directed networks can be characterized 
by the statistics of loops, the building blocks of closed 
paths, which provides information on structural corre- 
lations 01, motifs, robustness and redundancy of path- 
ways, and impacts dynamical as well as equilibrium crit- 
ical phenomena on the network Q . 

In undirected networks, the large number of short loops 
together with small diameter gives rise to the small- world 
effect encountered in many real systems Strikingly, 
in this Letter we show that there is a large class of di- 
rected networks for which the number of loops is strongly 
reduced with respect to the random hypothesis. The di- 
rected neural network of C. elegans, for example, has 
less than 50% of the short loops expected from a ran- 
dom ensemble with the same degree sequence despite the 
well-known fact that, when regarded as an undirected 
network 0], it has a clustering coefficient 5.6 times larger 
than randomly rewired versions of the network. 

Motivated by this empirical finding, we demonstrate 
numerically and analytically that degree correlations 
strongly constrain the loop structure of directed net- 
works. Moreover, we go beyond the degree-correlated 
picture and derive conditions for determining if a given 
network has more or less loops than its randomized coun- 
terparts. We characterize the network local organiza- 
tion in terms of short loops and its global organization in 
terms of long loops. We compare our analytical results 
with exact (when possible) or approximate numerical cal- 
culation of the number of loops in a class of directed 
networks that includes foodweb, power-grid, metabolic, 
neural, transcription and WWW networks. Our findings 
that many directed networks are underlooped may have 
broad implications given that such networks exhibit, for 
example, improved stability in foodweb systems Q and 
enhanced synchronization [7| and transportation proper- 
ties in various other systems [§] . 

Short Loops in Random Networks. We first derive the ex- 



pected number of self-avoiding loops in directed random 
networks. The general way to construct random uncor- 
related undirected networks is by means of the MoUoy- 
Reed model. Given a set of nodes V = {i : 1, . . . , TV}, the 
construction is based on generating a sequence of degrees 
{fc*} from a given degree distribution P{k) with a struc- 
tural cutoff A' = 0{N^/'^) 0, and randomly connecting 
the links. In this ensemble, the expected number Ml of 
short loops of length L is given by [13, [HI 



1 / (fc(fc-l)) 



(1) 



This formula implies that a network with diverging (fc^) 
has many more short loops than networks with finite (fc^). 
In particular, scale-free networks with scaling exponent 
7 < 3 have many short loops while Erdos-Renyi networks 
have a negligible number of short loops in the N ^ oo 
limit. We now show that this expression can be general- 
ized to random directed networks. We again consider the 
MoUoy-Reed construction but in this case we draw a se- 
quence of incoming and outgoing links ^oMt)} from 
a degree distribution P{kin,kout) for all nodes i & V. 
This distribution, which is not factorisable in general, 
describes correlated variables kin and kout at any given 
node. For directed uncorrelated networks, the structural 
cutoffs for in- and out-degrees satisfy KinKout < {kin)N. 
Proceeding as in the undirected case [l^l , we obtain that 
the expected number of loops of size L in the directed 
network ensemble is given by 



(kin kout ) 
{kin) 



(2) 



where this approximate expression is valid for large N 
and loop length satisfying L < N{ki„kout)'^ / {{kinkout)'^) ■ 
For undirected networks, ii^(jjj-(A/L) reduces to 
^undir(-^i-) because the incoming connectivity is k 
and the outgoing connectivity at the end point of a link 
(on a self-avoiding loop) is fc — 1. The only difference 
is a factor 2, which accounts for the orientation on the 
loops in Eq. 
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We observe from Eq. ([2]) that, in directed networks, 
the one-point correlation between the number of incom- 
ing and outgoing links modulates the expected number 
of short loops. Indeed, if and kout on the same nodes 
are not correlated, then the number of short loops is 
strongly reduced as compared to the case when kin and 
kout are positively correlated. The Barabasi- Albert (BA) 
networks [l^ . for example, have small degree correlations 
and are within the scope of E^-„{Mt,) and Ey^^^-^y.{Mi,) for 
uncorrelated random networks [l3[. If we consider the 
undirected BA model, we find that the networks have 
many short loops compared to random Erdos-Renyi net- 
works (in fact (fc(fc — 1)) ~ log(7V)) In contrast, 
if we consider the directed version of the BA model (in 
which the incoming links arc linked preferentially, and 
hence {kinkout) = {kin) {kout)), the networks have a neg- 
ligible number of short loops just as the Erdos-Renyi net- 
works in the N ^ oo limit. 

Short Loops in a Given Network. A different approach 
is needed for counting the loops of a specific directed 
network, as required in the study of real systems. In 
this case, as in the case of undirected networks the 
number of short loops can be expressed in terms of pow- 
ers of the adjacency matrix. In particular, the number 
of (self-avoiding) loops of length L can be expressed as 
the total number of closed paths of length L, i.e. Tr 
A^/L, minus the closed paths of length L composed of 
self-intersecting loops. The number of loops of length L 
in a network with adjacency matrix A is then given by 

A/i = X E{La <{Li})S{L^E, Le) UM''')^^^ where 
the sequence {Li} describes the loop composition of the 
paths for every correction term (for example, in the case 
i = 5 we will find a correction term involving paths com- 
posed of {Li} = {2,3} directed loops). The coefiicients 
c{{Li}) remain small for small L. 

Starting from this general formula we derive upper 
and lower bounds for the number of loops in a given 
directed network. The upper bound is simply given by 
the sum of all closed paths of length L, i.e. Ml < 
Tr A^ = J J2n ^n' where the sum is performed over all 
the eigenvalues (including multiplicities) . To find a lower 
bound we have to express A/l in terms of the eigenvalues 
of the adjacency matrix A and in terms of its Jordan ba- 
sis. In this way, it follows that A/l ~ Tr A^ / L provided 

that KL EE max,E, E- ( ^ ) I I « 1' 

where P is the matrix of generalized eigenvectors of A in 
the Jordan decomposition A = PJP~^ Q and Y^^^ ii^di- 
cates the sum over m over the dimension of each Jordan 
block with associated eigenvalue Xj , under the constraint 
that indices j and j + m are in the same block. If ^ 1, 
the dominant term in the expansion of Ml is the one with 
{Li} = {L} and we have AAl ~ i Xln ^n- 

Comparing these results with the result found for the 
random case in Eq. ([2), it follows that a sufficient con- 
dition for a specific network to have less short loops 



of length L than its randomized versions is < 
{{kinkout) / {kin))^ ■ Conversely, if kl < 1, a condi- 
tion for the network to have more loops is E„ > 
{{kinkout) I {kin))^ ■ For loops in a certain range of values 
i G (1, Lc), it is convenient to restate these conditions as 



{kin kout) 
{kin) 



for the network to be under-shortlooped and 



^ {kinkout) 
{kin) 



if , 



max <C 1 

Le(i,Le) 



(3) 



(4) 



for the network to be over-shortlooped on average over 
loop lengths L £ (l,Lc)- The over-bar indicates average 
over L S (l,ic) for Lc satisfying the condition for Eq. 
^ to be vahd. 

Long Loops. The above analysis applies to short loops. 
Counting long loops is a difficult problem for which ap- 
proximate Monte Carlo [l6| and statistical mechanics 
methods have been proposed in the undirected case. 
To derive a necessary condition for long directed loops 
to be present, we use percolation predictions [l5| for 
two-point correlated networks, where the out-dcgrce of 
a node is correlated (beyond the random condition) with 
the in-degree of the nodes at the end points of its links. 
These networks are expected to account for the leading 
correlation term that distinguishes a real network from 
its uncorrelated random counterparts. In networks with 
two-point degree correlation, the percolation condition 
for the largest strongly connected component (LSCC) is 
A > 1, where A is the largest eigenvalue of the two- 
point correlation matrix Ck'.k = (fcout ^'(k'|k)) [l^. A 
strongly connected component of a network is a set of 
nodes where each node can reach and be reached by 
all the others through directed paths. For the uncor- 
related random networks of the MoUoy-Reed ensemble, 
the largest eigenvalue of matrix C reduces to the known 
result A = itiiiJi^^^ Por a specific network, which is not 
necessarily well approximated by an uncorrelated ensem- 
ble average, we can use the approximation A ~ A, where 
A denotes the largest eigenvalue of the adjacency ma- 
trix [3l ■ Consequently the percolation conditions for the 
real and randomized networks are respectively A > 1 and 



> 1. Since the existence of a giant LSCC is a 
necessary condition for the network to have long directed 
loops, long loops are strongly suppressed when A < 1. 
Because percolation only provides a necessary condition 
for the existence of long loops, we make quantitative pre- 
dictions using a modified message-passing algorithm fioj 
based on the belief propagation (BP) algorithm proposed 



17j . The algorithm provides an estimation for the en- 



tropy a{L) = log{ML)/N of the loops of length L, from 
which we calculate Ml ■ Within the conditions discussed 
in [Tit , namely that the network is large and has a large 
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FIG. 1: (Color online) Number of short directed and undi- 
rected loops in several networks, where different symbols cor- 
respond to the numerically determined values for the real and 
random counterparts of the networks. The lines indicate the 
theoretical predictions in Eqs. ((T} and for random net- 
works. Points on the x-axis indicate no loops. 
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FIG. 2: Underlooped, overlooped and undetermined regions 

in the \short = A/ {{kinkout) / {kin}) vs. Xlong = Ljj^axj {Lmax ) 

diagram, where Lmaa; and I/mox* are predicted using the BP 
algorithm. The points correspond to the predictions for both 
short and long loops for the networks in Table I, except for the 
ND WWW, which is over-shortlooped and is not shown be- 
cause it is difficult to calculate its Xiong- The actual counting 
of the loops confirms the predictions (Table I). 



number of loops, this algorithm is able to predict the 
maximal loop length L^ax reliably. However, as shown 
below, the BP algorithm predicts correctly the under- or 
over-looped nature of all networks in our database , in - 
cluding those with a small number of nodes or loops [20| , 
and the results are in very good agreement with the be- 
havior suggested by the relative values between A and 

Real Networks. 



We consider several real directed net- 
works [2l|: (?) Texas power grid; {it) foodwebs (Chesa- 



peake, Mondego, Littlerock, and Seagrass regions); [Hi) 
metabolic network of E. coli, where the nodes represent 
metabolites; (iv) Notre Dame University's WWW; (v) C. 
elegans' neural network; and {vi) transcription network 
of S. cerevisiae, where the nodes correspond to regulating 
and regulated genes. Figure [1] shows the distributions of 
short loops (measured using exact enumeration [2^ ) for 
both the directed and undirected versions of four real net- 
works along with the randomized counterparts of same 
number of in- and out-links in each node. The random- 
ized networks are well approximated by the theoretical 
predictions in Eqs. ((T|) and ([2]), as indicated by the lines 
in the figure. Directed networks tend to have less loops 
than undirected networks, as expected. However, while 
real undirected networks tend to have more loops than 
random ones, the opposite occurs in the directed case. 

Indeed, six out of the nine directed networks we ana- 
lyzed are under-shortiooped, as shown in Fig.[2]and Table 
I. The only exceptions are the metabolic and transcrip- 
tion networks, which are marginally over-shortlooped, 
and the WWW network, which is the only social net- 
work present in our database psj . These findings are very 
different from what one would anticipate from previous 
studies on undirected networks, where highly clustered 
small-world networks prevail. Figure [2] summarizes the 
prediction and actual tendency of the directed networks 
to be underlooped. Table I summarizes the network pa- 
rameters and results for all directed networks analyzed, 
where A is calculated by summing over all loops up to 
a length cutoff Lc chosen to be 6 [i^. Our predictions 
compare well with direct data analysis. 
Conclusions. We have studied deviations in the loop 
statistics and provided criteria for determining if a net- 
work is underlooped or overlooped compared to its ran- 
domized counterparts. Empirical evidence coming from 
the study of different types of natural and technological 
networks shows that many of these different networks are 
under-shortiooped, a surprising result which is in sharp 
contrast with the tendency of undirected networks to be 
over-shortlooped. The only socio-technological network 
in our databse, the ND WWW, contains instead very 
many short loops. We expect that our results will be 
important and further extended in the study of social, 
biological and technological systems. In social networks, 
the abundance of directed loops can be an important fac- 
tor in the promotion of mutual reinforcement amongst 
agents |24| . while in cellular and neural networks it can 
play a major role in information processing [2^ and reg- 
ulation [26[. In other systems, the reduced number of 
directed loops can lead to improved stability @, 0] and 
transportation properties which we hope will stimu- 
late other applications of our findings. 

The authors thank Dong-Hee Kim and Marian Boguna 
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Network Parameters 


Prediction/Actual*"^ 


Network 


N 


M 


A 


{ki„) 


A 


Short loops*''' 


Percolation/LSCC*=' 


Long loops***' 


Littlerock FW 


183 


2,494 


7.00 


11.47 


7.93 


und/und 


p-p/(12 vs 92) 


und / und* 


Chesapeake FW 


39 


177 


2.85 


3.12 


2.40 


und / und 


p-p/(41 vs 76) 


und/und 


Mondego FW 


46 


400 


8.95 


9.14 


5.86 


und/und 


p-p/(76 vs 92) 


und / und* 


Seagrass FW 


48 


226 


1.00 


4.05 


1.65 


und / und 


np-p/(0 vs 75) 


und / und 


Metabolic net. 


532 


596 


2.85 


2.58 


3.00 


undct / over 


p-p/(82 vs 94) 


undct/undet* 


Power-grid net. 


4,889 


5,855 


1.00 


1.36 


0.88 


und/und 


np-p/(0.1 vs 33) 


und/und 


ND WWW 


325,729 


1,497,135 


152.00 


43.14 


153.32 


over/over 


p-p/(17 vs 41) 




Neural net. 


306 


2,359 


9.15 


10.49 


8.84 


und/und 


p-p/(78 vs 86) 


und/und* 


Transcription net. 


688 


1,079 


1.32 


0.36 


0.88 


undct / over 


p-np/(0.4 vs 0.3) 


over / over 



TABLE L Properties of real directed networks: number of nodes N and links M, eigenvalue A, {kink out) / (kin) , and spectral 
quantity A (l.h.s. columns); loop structure and percolation properties (r.h.s. columns). The values of k for the undetermined and 
over-shortlooped cases are 96.0, 73.2 and 0.2 for the metabolic, transcription and WWW network, respectively. Under looped 
(und), overlooped (over), undetermined (undet), not determined numerically ( — ). Actual values determined by averaging 
over the directed loops up to length Lc = 6 (Lc = 3 for the ND WWW). ''^■'From left to right: predicted percolating (p) or 
non-percolating (np) LSCC in real and random networks together with the actual percentage of nodes in the LSCC of the real 
vs. random networks. *'*'From left to right: prediction for long loops obtained using the BP algorithm [l^ to estimate L^ax 
and the actual result obtained using exhaustive or partial (*) enumeration of the loops. 
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